Goto

Collaborating Authors

 minimizing inverse dynamic disagreement


Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement

Neural Information Processing Systems

In contrast to Learning from Demonstration (LfD) that involves both action and state supervisions, LfO is more practical in leveraging previously inapplicable resources (e.g., videos), yet more challenging due to the incomplete expert guidance. In this paper, we investigate LfO and its difference with LfD in both theoretical and practical perspectives. We first prove that the gap between LfD and LfO actually lies in the disagreement of inverse dynamics models between the imitator and expert, if following the modeling approach of GAIL. More importantly, the upper bound of this gap is revealed by a negative causal entropy which can be minimized in a model-free way.


Reviews: Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement

Neural Information Processing Systems

I was happy with your inclusion of experiments on manipulation tasks, and agree they're convincing. I was also happy with your explanation on GAILfo vs GAIL vs your algorithm, and your discussion on Sun et al 2019. Your decision to release code also helps with any fears I have about reproducibility. I have changed my score to an 8 to reflect these improvements. Easy to follow the logic and thoughts of the authors.


Reviews: Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement

Neural Information Processing Systems

Learning from Observation (LoF) is harder, but more practical, than Learning from Demonstration (LfD) that involves both action and state supervisions. The paper studies the difference between the two types of learning in both theoretical and practical perspectives, and relates the gap between LfD and LfO to inverse dynamics disagreement between the imitator and the expert. The paper includes an elaborate and interesting theoretical analysis of this gap, and proposes a method for bridging the gap through entropy maximization. The empirical evaluation is also thorough and includes both a toy problem for studying the effect of inverse dynamics discrepancy, MuJoCO problems and an ablation study. The reviewers are in agreement that this is a good, technically sound paper.


Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement

Neural Information Processing Systems

In contrast to Learning from Demonstration (LfD) that involves both action and state supervisions, LfO is more practical in leveraging previously inapplicable resources (e.g., videos), yet more challenging due to the incomplete expert guidance. In this paper, we investigate LfO and its difference with LfD in both theoretical and practical perspectives. We first prove that the gap between LfD and LfO actually lies in the disagreement of inverse dynamics models between the imitator and expert, if following the modeling approach of GAIL. More importantly, the upper bound of this gap is revealed by a negative causal entropy which can be minimized in a model-free way. Considerable empirical results on challenging benchmarks indicate that our method attains consistent improvements over other LfO counterparts.


Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement

Yang, Chao, Ma, Xiaojian, Huang, Wenbing, Sun, Fuchun, Liu, Huaping, Huang, Junzhou, Gan, Chuang

Neural Information Processing Systems

In contrast to Learning from Demonstration (LfD) that involves both action and state supervisions, LfO is more practical in leveraging previously inapplicable resources (e.g., videos), yet more challenging due to the incomplete expert guidance. In this paper, we investigate LfO and its difference with LfD in both theoretical and practical perspectives. We first prove that the gap between LfD and LfO actually lies in the disagreement of inverse dynamics models between the imitator and expert, if following the modeling approach of GAIL. More importantly, the upper bound of this gap is revealed by a negative causal entropy which can be minimized in a model-free way. Considerable empirical results on challenging benchmarks indicate that our method attains consistent improvements over other LfO counterparts.